AITopics

Country: North America > United States > Illinois (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.65)

Industry:

Information Technology (1.00)
Education (1.00)
Transportation > Ground > Road (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.86)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)

Rossa, Tom, Phillips, Angus, Rainforth, Tom

Action-BED: Task-Driven Bayesian Experimental Design with Singly Intractable Objectives

arXiv.org Machine LearningJun-23-2026

Bayesian experimental design (BED) has traditionally been based on maximising expected uncertainty reductions from prior to posterior. A major shortfall of this approach is that it leads to doubly intractable objectives that are difficult to optimise, while customising them to particular downstream tasks of interest can also be difficult. Following first principles decision theory, we demonstrate that BED can alternatively be formulated in terms of an expected future loss (EFL) on downstream actions, providing a simple and naturally task-driven framework. Critically, we then show that all such EFLs can be rearranged into singly intractable objectives that can be jointly optimised with respect to both the design policy and a downstream action policy using stochastic gradients, an approach we refer to as ACTION-BED. This formulation further sidesteps the need for any explicit posterior or marginal likelihood estimation and is naturally implicit, requiring only the ability to sample from the joint model over model parameters and data, and evaluate the downstream loss function. It thus allows design policies to be learned more effectively, efficiently, and simply than existing methods, while providing easy customisation to different downstream tasks and losses.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

2606.23662

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Neural Information Processing SystemsJun-14-2026, 06:32:51 GMT

Real-DRL: Teach and Learn at Runtime

This paper introduces the Real-DRL framework for safety-critical autonomous systems, enabling runtime learning of a deep reinforcement learning (DRL) agent to develop safe and high-performance action policies in real plants while prioritizing safety. The Real-DRL consists of three interactive components: a DRL-Student, a PHY-Teacher, and a Trigger. The DRL-Student is a DRL agent that innovates in the dual self-learning and teaching-to-learn paradigm and the safety-status-dependent batch sampling. On the other hand, PHY-Teacher is a physics-model-based design of action policies that focuses solely on safety-critical functions. PHY-Teacher is novel in its real-time patch for two key missions: i) fostering the teaching-to-learn paradigm for DRL-Student and ii) backing up the safety of real plants. The Trigger manages the interaction between the DRL-Student and the PHY-Teacher. Powered by the three interactive components, the Real-DRL can effectively address safety challenges that arise from the unknown unknowns and the Sim2Real gap. Additionally, Real-DRL notably features i) assured safety, ii) automatic hierarchy learning (i.e., safety-first learning and then high-performance learning), and iii) safety-informed batch sampling to address the experience imbalance caused by corner cases. Experiments with a real quadruped robot, a quadruped robot in Nvidia Isaac Gym, and a cart-pole system, along with comparisons and ablation studies, demonstrate the Real-DRL's effectiveness and unique features.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Neural Information Processing SystemsFeb-9-2026, 21:44:37 GMT

CaMP: Causal Multi-policy Planning for Interactive Navigation in Multi-room Scenes Xiaohan Wang

Visual navigation has been widely studied under the assumption that there may be several clear routes to reach the goal.

artificial intelligence, machine learning, obstacle, (13 more...)

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Vision (0.70)
Information Technology > Artificial Intelligence > Robots (0.68)

Peter Schulam, Suchi Saria

Reliable Decision Support using Counterfactual Models

Neural Information Processing SystemsNov-21-2025, 06:08:56 GMT

One use of such an estimate is to evaluate risk; e.g. is this patient likely to die if I do not

data mining, machine learning, reinforcement learning, (19 more...)

Country:

North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Nephrology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Artificial IntelligenceNov-4-2025

Deep reinforced learning enables solving rich discrete-choice life cycle models to analyze social security reforms

Tanskanen, Antti J.

Discrete-choice life cycle models of labor supply can be used to estimate how social security reforms influence employment rate. In a life cycle model, optimal employment choices during the life course of an individual must be solved. Mostly, life cycle models have been solved with dynamic programming, which is not feasible when the state space is large, as often is the case in a realistic life cycle model. Solving a complex life cycle model requires the use of approximate methods, such as reinforced learning algorithms. We compare how well a deep reinforced learning algorithm ACKTR and dynamic programming solve a relatively simple life cycle model. To analyze results, we use a selection of statistics and also compare the resulting optimal employment choices at various states. The statistics demonstrate that ACKTR yields almost as good results as dynamic programming. Qualitatively, dynamic programming yields more spiked aggregate employment profiles than ACKTR. The results obtained with ACKTR provide a good, yet not perfect, approximation to the results of dynamic programming. In addition to the baseline case, we analyze two social security reforms: (1) an increase of retirement age, and (2) universal basic income. Our results suggest that reinforced learning algorithms can be of significant value in developing social security reforms.

artificial intelligence, life cycle model, machine learning, (16 more...)

doi: 10.1016/j.ssaho.2022.100263

2010.13471

Country: Europe > Finland (0.15)

Genre: Research Report > New Finding (0.68)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Social Services (1.00)
Banking & Finance > Economy (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceNov-4-2025

Real-DRL: Teach and Learn in Reality

Mao, Yanbing, Cai, Yihao, Sha, Lui

This paper introduces the Real-DRL framework for safety-critical autonomous systems, enabling runtime learning of a deep reinforcement learning (DRL) agent to develop safe and high-performance action policies in real plants (i.e., real physical systems to be controlled), while prioritizing safety! The Real-DRL consists of three interactive components: a DRL-Student, a PHY-Teacher, and a Trigger. The DRL-Student is a DRL agent that innovates in the dual self-learning and teaching-to-learn paradigm and the real-time safety-informed batch sampling. On the other hand, PHY-Teacher is a physics-model-based design of action policies that focuses solely on safety-critical functions. PHY-Teacher is novel in its real-time patch for two key missions: i) fostering the teaching-to-learn paradigm for DRL-Student and ii) backing up the safety of real plants. The Trigger manages the interaction between the DRL-Student and the PHY-Teacher. Powered by the three interactive components, the Real-DRL can effectively address safety challenges that arise from the unknown unknowns and the Sim2Real gap. Additionally, Real-DRL notably features i) assured safety, ii) automatic hierarchy learning (i.e., safety-first learning and then high-performance learning), and iii) safety-informed batch sampling to address the learning experience imbalance caused by corner cases. Experiments with a real quadruped robot, a quadruped robot in NVIDIA Isaac Gym, and a cart-pole system, along with comparisons and ablation studies, demonstrate the Real-DRL's effectiveness and unique features.

artificial intelligence, equation, machine learning, (16 more...)

2511.00112

Country: North America > United States > Illinois (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology (1.00)
Transportation > Ground > Road (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.87)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)

Neural Information Processing SystemsOct-8-2025, 10:12:40 GMT

CaMP: Causal Multi-policy Planning for Interactive Navigation in Multi-room Scenes Xiaohan Wang

Visual navigation has been widely studied under the assumption that there may be several clear routes to reach the goal.

agent, navigation, obstacle, (11 more...)

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Vision (0.70)
Information Technology > Artificial Intelligence > Robots (0.68)

arXiv.org Artificial IntelligenceOct-1-2025

Memory-Driven Self-Improvement for Decision Making with Large Language Models

Yan, Xue, Ou, Zijing, Yang, Mengyue, Song, Yan, Zhang, Haifeng, Li, Yingzhen, Wang, Jun

Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memory-driven self-improvement framework that combines LLM general prior knowledge with a compact memory of domain-specific experiences. Memory retains past interactions and associated Q-values, thereby capturing decision-relevant knowledge that facilitates accurate value estimation and informs the LLM prior refinement. The refined LLM prior, in turn, generates higher-reward trajectories that further enrich memory, forming a natural self-improvement framework where memory and LLM prior mutually reinforce each other. Experiments show that our memory-driven approach significantly outperforms both traditional RL and LLM-based baselines, e.g., improving performance by over 40\% on in-distribution tasks and over 75\% when generalized to unseen tasks in ALFWorld.

artificial intelligence, large language model, natural language, (17 more...)

2509.2634

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceSep-23-2025

The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via Differentiable Token Pruning

Jiang, Titong, Jiang, Xuefeng, Ma, Yuan, Wen, Xin, Li, Bailin, Zhan, Kun, Jia, Peng, Liu, Yahui, Sun, Sheng, Lang, Xianpeng

We present LightVLA, a simple yet effective differentiable token pruning framework for vision-language-action (VLA) models. While VLA models have shown impressive capability in executing real-world robotic tasks, their deployment on resource-constrained platforms is often bottlenecked by the heavy attention-based computation over large sets of visual tokens. LightVLA addresses this challenge through adaptive, performance-driven pruning of visual tokens: It generates dynamic queries to evaluate visual token importance, and adopts Gumbel softmax to enable differentiable token selection. Through fine-tuning, LightVLA learns to preserve the most informative visual tokens while pruning tokens which do not contribute to task execution, thereby improving efficiency and performance simultaneously. Notably, LightVLA requires no heuristic magic numbers and introduces no additional trainable parameters, making it compatible with modern inference frameworks. Experimental results demonstrate that LightVLA outperforms different VLA models and existing token pruning methods across diverse tasks on the LIBERO benchmark, achieving higher success rates with substantially reduced computational overhead. Specifically, LightVLA reduces FLOPs and latency by 59.1% and 38.2% respectively, with a 2.6% improvement in task success rate. Meanwhile, we also investigate the learnable query-based token pruning method LightVLA* with additional trainable parameters, which also achieves satisfactory performance. Our work reveals that as VLA pursues optimal performance, LightVLA spontaneously learns to prune tokens from a performance-driven perspective. To the best of our knowledge, LightVLA is the first work to apply adaptive visual token pruning to VLA tasks with the collateral goals of efficiency and performance, marking a significant step toward more efficient, powerful and practical real-time robotic systems.

large language model, machine learning, natural language, (17 more...)

2509.12594

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.31)